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Abstract 

Structure from motion often refers to the computation of 3D structure from a matched 
sequence of images. However, a (relative) depth map of a surface may not be a good 
representation for storage and recognition; a more concise representation seems necessary. 
The sign of the Gaussian curvature of a surface is one candidate to be a part of a useful 
representation of the surface. I will show that in order to compute the sign of the Gaussian 
curvature it is not necessary first to go through the computationally expensive and error 
sensitive process of recovering the exact function of the surface and the motion parameters. 

I will first show that the sign of the normal curvature in a given direction at a given 
point in the image can be computed from a simple difference of slopes of line-segments in 
one image. Using this result, local surface patches can be classified as convex, concave, 
parabolic (cylindrical), hyperbolic (saddle point) or planar. At the same time the transla- 
tional component of the optical flow is obtained, from which the focus of expansion can be 
computed. In addition, the axes of principal curvature and the axes of zero curvature are 
obtained. 
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1 Introduction 

When a scene is recorded from two (or more) different positions in space, objects are projected 
into different locations in each image. The disparity in position between the two images may 
be used to obtain the exact coordinates of objects if the motion of the camera relative to each 
object is known. This view of motion and stereo regards vision as a problem of inverse optics, 
namely, the goal is to find the inverse transformation of the optical imaging process (perspective 
projection). The computation is usually divided into two main steps. The first is correspon- 
dence: matching features in the two images to find the appropriate disparity in position for 
each object or feature. This may be a difficult computation for many image pairs. In stereo 
in particular it is considered the heart of the computational problem (e.g., [1]). Henceforth I 
will assume that matching is given. The second step is the determination of the motion (or 
camera) parameters that can be used to compute the distance to objects in space using geo- 
metrical transformation. This is, in general, a very difficult computation. I will discuss some 
important higher level goals for which it can be avoided. For these limited goals solving the 
second subproblem may be unnecessary. 

The problem of computing the motion parameters from motion disparities or optical flow 
(local velocities) has received much attention. The corresponding problem of camera calibration 
in stereo, however, is often ignored. This attention is often motivated by the assumption that 
this computation is a prerequisite for higher level tasks such as navigation or recognition. For 
example, for the computation of a complete 3D structure from motion the motion parameters 
should be known. Structure from motion results often deal mainly with the minimal number of 
points that are necessary to compute the inverse transformation (see [2]). For this purpose it has 
been shown that 7 or 8 matched points in two views ([3] and [4]) or 5 points and their velocities 
in one view ([5]) are sufficient. The actual algorithms, however, are typically computationally 
expensive and sensitive to noise. It is hard to guarantee a sufficiently good estimation of the 
motion parameters to maintain small errors in the structure computation (see [6]). 

General motion can be decomposed into a rotation around some axis followed by a trans- 
lation. In a similar way the optical flow vector can be decomposed into two components: one 
due to the translation component of the motion and one due to the rotation component. In 
perspective projection and if the motion is translation only, the optical flow takes a very simple 
form: straight lines that intersect at a single point, the focus of expansion (FOE), see figure 1. 
This point is the projection of the point towards which (or away from which) the camera's 
motion is directed. If the motion is rotational only, the flow field takes the form of concentric 
circles (see figure 1). It has been argued that if we can identify the two components of the flow 
field then the problem is almost solved, the direction of motion and relative depth of all points 
can be computed from the translational component of the motion (see [7]). 

Because of the practical difficulties in devising a robust algorithm that will find a complete 
solution of the problem, the need for a more qualitative approach to motion analysis and to 
vision in general has been expressed (e.g., [8], [9] and [10]). It has been motivated in part by 
the experimentally plausible hypothesis that human vision does not compute the exact inverse 
mapping of the projection of a 3D world onto a 2D retina. In addition for many purposes, such 
as navigation, it has been shown that the complete solution of the motion parameters may not 
be necessary (e.g., [11] and [12]). The computation of an exact 3D structure may not even be 
necessary for recognition. The exact 3D coordinates of a surface do not seem to be a good 
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Figure 1: An example of an optical flow, left: translation only, right: rotation only. 

representation for either storage or recognition (see [13]), a more concise representation seems 
necessary. The sign of the Gaussian curvature of a surface is one candidate to be a part of a 
useful representation of the surface. In accordance with this view, Koenderink and van Doom 
(see, [14], [15] and [16]) have proposed an alternative theoretical approach to the analysis of 
stereo and motion (assuming matching is given). They show how various qualitative properties 
of objects and the motion field are related to invariants of a vector field (the optical flow or 
stereo disparity field). 

In this work I will discuss some motion and shape characteristics that can be computed 
directly from motion and stereo disparities with a very simple operator. It is not necessary to 
go first through the computationally difficult and error sensitive process of recovering the exact 
function of the surface and the motion parameters. Thus additional errors in the computation 
caused by using motion parameters that have been obtained from noisy data are avoided. It 
should be noted that the computation of the shape features discussed here is not immediate 
even when a complete 3D reconstructed surface is given (see [13] and [17]). 

First, the sign of the normal curvature of a curve on a surface is computed from following 
three points on the curve that are collinear in one image. If the points remain collinear in 
the other image, the normal curvature is 0. In forward motion, if the smaller angle created 
by the three points in the other image is turned towards the focus of expansion (FOE), the 
sign is negative. If the smaller angle is turned away from the FOE, the sign is positive. In 
backward motion the sign reverses. Note that the direction of the normal to the surface is not 
needed for this computation. Although perspective projection is assumed, otherwise the focus 
of expansion is not defined, its effects on motion disparities can be large or negligible (in the 
orthographic projection limit). 

Regardless of the location of the FOE, this simple operator can be computed at a selected set 
of directions around a point to determine the sign of the Gaussian curvature of a local surface 
patch, an intrinsic property of the surface. From this analysis, the direction of the translational 
component of the motion is immediately obtained. From this component it is possible to obtain 
the focus of expansion (FOE). The location of the FOE can be used to complete the classification 



of local surface patches as convex, concave, parabolic (cylindrical), hyperbolic (saddle point) 
or planar. In addition, the directions of the axes of zero curvature, and hence the directions of 
the principal axes, are also immediately obtained from this computation. The analysis does not 
depend upon special constraints on the nature of objects in the environment, such as assuming 
smoothly curved surfaces or a particular analytic representation of the surface. 

The rest of the paper is organized as follows. In section 2 I review the basic differential 
geometry concepts of normal curvature and Gaussian curvature and their potential usefulness 
for object representation. In section 3 I show how surfaces are classified and the focus of 
expansion is computed as described above. In stereo the ambiguity of a region with positive 
Gaussian curvature can be resolved without additional computations, as shown in section 3.3. In 
section 4 I show that the simple sign operator described in section 3 is almost as accurate in the 
presence of noise as the best algorithm that uses the 3D coordinates obtained from the same 
noisy data and using perfect motion parameters (i.e. uncorrupted inverse transformation). 
Since one would expect the noise to corrupt the motion parameters estimation significantly, 
the sign algorithm that uses 2D projections directly seems to be more robust. In section 5 I 
discuss the possible relevance of these results to biological vision. I also discuss the relation to 
some literature about structure from motion. The proofs of the results discussed in section 3 
are given in the appendix. 

2 Surface curvature and its importance to object representa- 
tion 

The normal curvature of a 3D curve on a regular surface through some point is its curvature 
with respect to the normal to the surface. That is, the curve is projected on a plane that 
includes the normal and its tangent (a normal section) and the curvature of the projected 
planar curve is the normal curvature of the original 3D curve, see figure 2. The curvature of a 
curve relative to the normal to the surface is what determines the curvature of the surface. For 
example, if all normal curvatures are negative, namely all the curves are convex relative to the 
normal, the surface is convex. If all are concave, the surface is concave. If some are convex and 
some concave, the surface is hyperbolic, i.e. it has a saddle point. 

The normal curvature of all the curves on the surface through some point can be written 
as a linear combination of two principal curvatures K\ and K2- These are the curvatures of two 
perpendicular curves on the surface, the principal axes, that obtain the extrema of the normal 
curvatures of all curves on the surface passing through the same point. Let n n denote the 
normal curvature of some curve on the surface that makes an angle 9 with the first principal 
axis. Then 

K n = n\ • cos 2 9 + Ki • sin 2 6 . (1) 

Thus the local curvature of a local surface patch can be described in terms of two numbers only, 
Ki and K2- The product of the two principal curvatures K\ • k-i is called the Gaussian curvature 
of the surface. It characterizes the surface independently of the environment. 
The sign of the Gaussian curvature locally classifies the surface as follows: 

1. elliptic («i • K2 > 0), 






<Q 



Figure 2: The normal curvature of a 3D curve u on a surface, whose tangent through P is w. 
Below is the projection of the curve on the normal section. Left: a convex example (negative 
curvature), right: a concave example (positive curvature). 
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Figure 3: An illustration of the different surface types used for classification of surfaces, see 

text. 



• convex, see figure 3a-left (k,\ , k 2 < 0) 

• concave, see figure 3a- right («i, k 2 > 0) 

2. parabolic (cylindrical), see figure 3b («i • k 2 — 0, «i > or k 2 < 0), 

3. hyperbolic (saddle point), see figure 3c (ki • k 2 < 0, i.e. Ki > and k 2 < 0), 

4. planar («i • k 2 = 0, ki = k 2 = 0). 

It follows from equation (1) that the number of asymptotes, or the number of curves on the 
surface with zero-curvature, determines the type of the surface. Namely, 

1. elliptic: no asymptote, 

2. parabolic: one asymptote, 

3. hyperbolic: two asymptotes, 

4. planar: infinite number of asymptotes. 

Thus for surfaces where the asymptotes are locally straight lines on the surface, the number of 
straight lines on the surface that cross a point will determine the type of the surface. Various 
cues like intensity gradients (see [18]) can be used to determine whether a straight line in 
the image originated from a straight line on the surface (and thus of zero-curvature). Motion 
and stereo disparities help determine the sign of the curvature in between the zero-curvature 
directions which is necessary for surface classification (see section 3). 

The shape of most objects can be described by an analytic function of the surface, i.e. 
a relative depth map. For purposes of storage efficiency and recognition, a complete depth 
map seems wasteful. As a representation it is sensitive to viewing direction and noise; it is 
computationally expensive to match at a recognition stage; and it does not easily generalize to 
give a single representation for similar objects. One alternative is representing the shape of an 
object as a collection of parts where each part is described by a few surface features. Classifying 
regions as convex, concave, planar, cylindrical, or hyperbolic provides one important intrinsic 
surface feature. This classification can also help in finding part boundaries within an object 
(figure 4a) that occur often at parabolic lines. Often the axes of principal curvature and axes 
of zero-curvature, like parabolic lines that are the boundaries between different surface types, 
give important directions on the surface (figure 4b). 

3 Shape classification 

3.1 Surface curvature and FOE from motion disparities 

Henceforth perspective projection and a motion with nonzero translational component are as- 
sumed so that the focus of expansion (see section 1) is defined. Under these conditions the 
analysis holds at the orthographic projection limit (that is, the perspective projection has neg- 
ligible effect on the disparities yet the FOE is defined). In this limit the motion should not be 
translation in depth only. 
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Figure 4: Why classify surfaces: a) the classification may help divide an object into parts; b) 
axes of principal curvature are often meaningful curves on the surface. The dashed lines are 
parabolic lines. 



Proposition 1 Let P Q denote a point on the surface of some object whose projection in the 
first image is Oq. Let P\ and P 2 denote two other points on the same surface whose projections 
in the first image are 0\ and 0%, and where Oq, 0\ and 2 are collinear. Let Oq, 0\, and 2 
be the projections of the same three points in a second image. Assume the motion is backward 
(away from the focus of expansion). Then the sign of the normal curvature of the curve £ 
passing through Pq, P\, and Pi can be determined as follows: 

• if the smaller angle through Oq, 0\ and 2 is turned towards the focus of expansion then 
the normal curvature of £ is positive (see figure 5a). 

• if Oq, 0\ and 2 are collinear then the normal curvature of £ is (see figure 5b). 

• if the smaller angle through Oq, 0\ and 2 is turned away from the focus of expansion 
then the normal curvature of £ is negative (see figure 5c). 

In forward motion (towards the focus of expansion), the interpretation of the angle is reversed. 
(The motion of the coordinate system is defined to be a rotation followed by a translation.) 

A proof is given in the appendix. It consists of two steps. First, it is shown that the sign of the 
normal curvature, the sign of a curve's curvature relative to the normal to the surface, equals 
the sign of the curvature relative to the line of sight in the first image. Thus the direction of the 
normal is not needed for this computation. Second, it is shown that the sign of the curvature 
relative to the line of sight equals the sign of the curvature relative to the line through the FOE 
and the curve in almost any 2D perspective projection of the curve, e.g. in the second image. 

Figure 6 illustrates the implication of proposition 1. In a concave region, three collinear 
points in the first image will move to three non-collinear points in the second image turning 
towards the focus of expansion. 

In practice I compute the difference of the slopes of the line segments through Oo and 0\ 
and through Oo and 2 , angles /?i and /? 2 in figure 5a. Thus, if 0; = (x{, yi), the sign operator 
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Figure 5: The sign of the normal curvature is determined by the relation between the angle 
through three points in the second image, that are collinear in the first image, and the focus of 
expansion. Above is the first image, , 0\ and 2 are collinear. Below are the corresponding 
points in the second image Oo, <5i and 02: a) the normal curvature is positive, b) the normal 
curvature is 0, c) the normal curvature is negative. 
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Figure 6: In a concave region, collinear points (left) move to noncollinear points (right) that 
are turning towards the focus of expansion. 



IS 

~ 2/2 - J/o 2/1 - 2/0 



a?2 - £o a?i — £o 



(2) 



The dependence of the relation between the sign of T and the sign of the normal curvature 
on the location of the FOE is summarized in the following proposition (proof is given in the 
appendix): 

Proposition 2 Choose 0\ and 2 so that they are collinear with O and lie on different sides 
of O . Assume backward motion (the motion is defined now as a translation followed by a 
rotation). If 0\ is chosen such that the angle through 0\, O and the FOE going clockwise is 
smaller than 180°, that is, 0\ is above the sign-bisector in figure 5, then the sign of T equals 
the sign of the normal curvature of(. If 0\ is chosen so that the angle is larger than 180° then 
the sign o/T is opposite to the sign of the normal curvature of (■ If the angle equals 180° then 
the sign o/T is identically 0. 

One result of proposition 2 is that if 0\ is chosen around O in all orientations between 0° and 
360°, the correlation between the sign of T and the sign of the normal curvature reverses at 
the orientation where Oi, O and the FOE are collinear (r in figure 5). The direction where 
T changes sign will be used later to compute the direction of the translational component of 
the motion at Po. 

Now it is possible to classify the surface near a point Po using the following simple algorithm: 
In the first image, for each direction r from a sample set of directions axound Oq (see upper 
part of figure 5) choose two points in the image Oi and 2 on both sides of O so that they are 
collinear and 0\ defines a slope r. It is assumed that 0\ and Oi are the projections of points 
lying on the same surface as Po. Choose 0\ at all orientations r around Oo, 0° < r < 360°, 
Compute T(r) for all r. Then: 

• T(t) changes sign twice (see figure 7 above) =>■ surface is elliptic, 

• T(t) changes sign twice and obtains the value for some other directions r and r + 180° 
without changing sign =>• surface is parabolic, 

• T(r) = =£■ surface is planar, 

• T(r) changes sign six times (see figure 7 below) =>- surface is hyperbolic. 

(the sign changes four times at axes of zero- curvature and twice at the sign-bisector.) 

In the presence of noise, some threshold should be used instead of 0, which may cause regions 
whose curvature is low to be classified as planar. 

The sign of T(r) is ambiguous when the location of the FOE is not known. It gives the 
sign of the normal curvature for a range tq < t < To + 180° for some To and the inverse sign 
for other values of r. The direction tq is denoted sign-bisector (see figure 7). It is the direction 
where T(r) changes sign independently of the normal curvature. 

The same To gives the direction of the translational component of the motion at Po. This 
motion component can be used to obtain the focus of expansion and relative depth. In the 
elliptic case it is the only direction along which T(r) changes sign (figure 7). All such lines 
at angles tq(Pq) for different points Po intersect at a single point - the FOE (see figure 8). In 
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the hyperbolic case, T(r) changes sign at three directions (six orientations), as is illustrated in 
figure 9 right. Two are axes of zero- curvature and a third is the translational component of the 
motion. The third axis of sign change of T(r) at all the points intersect at the FOE. 

The location of the FOE can be used to complete the surface classification with T(r) if 0\ 
is chosen so that the angle between O x , O and the FOE when going clockwise is smaller than 
180°. The classification algorithm is now: 

• T(r) = Vr =>• surface is planar. 

• T(r) > Vr =*> surface is concave. 

• T(r) < Vr =$■ surface is convex. 

• T(r) > Vr or T(r) < Vr => surface is parabolic (cylindrical). The axis of zero 
curvature is the axis for which T(r) = 0. 

• T(r) changes sign => surface is hyperbolic. In this case the asymptotes are the directions 
for which T(r) = 0. The principal directions (direction of minimum and maximum 
curvature) are the lines that cross the two angles defined by the asymptotes. 

Note that this classification is done without the computation of the normal to the surface. 

To summarize, by computing the sign of T(r) for all 0° < r < 360° we can classify a surface 
as elliptic, hyperbolic, planar, or parabolic. At each point we also obtain the direction of the 
translational component of the motion. By using more than one point we are able to compute 
the location of the focus of expansion and thus further classify an elliptic region as convex or 
concave. In a hyperbolic region we obtain at each point three axes, two of which are axes of 
zero-curvature and one is the translational component of the motion. From the two axes of 
zero curvature we can compute the principal axes, the axes of minimal and maximal curvature, 
that are the two angle bisectors of the two axes of zero-curvature. 

3.2 Examples: 

Synthetic objects (a sphere and a torus) have been classified using the following algorithm: 
For each pixel (denoted Pq) in the first image that belong to the object: 

1. for each r in the range —90° < r < 90°, with 1° increments: 

(a) find two points on both sides of P that belong to the object and so that the three 
points are collinear with slope r. 

(b) find the coordinates of the three points in the second image by computing the motion 
transformation. 

(c) compute T(r). 

2. count the number of zero-curvature axes: 

(a) count the number of zero-crossings of T(r). 

(b) count the number of zero-touchings of T(r). 

(c) add the two numbers and subtract 1 (for ro, see figure 7). 
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(d) save the zero-crossings and the the zero-touchings. The single zero-crossing in the 
parabolic and elliptic cases is the translation component of the motion at P . The 
zero-touching in the parabolic case is the axis of zero curvature. The three zero- 
crossings in the hyperbolic case are the translation component of the motion at P 
and the two axes of zero curvature. 

3. classify P as elliptic, parabolic, planar or hyperbolic according to the number of axes of 
zero-curvature. 

4. Classify further an elliptic point: 

(a) if the location of the FOE is not known and more than two points have already been 
analyzed, compute the location of the FOE. Go to the next point if the location 
of the FOE is not known or if it is not known whether the motion is backward or 
forward. 

(b) take the sign of T(r) at r = 90°. 

(c) reverse the sign if forward motion. 

(d) reverse the sign if the x coordinate of P is smaller than the x coordinate of the 
FOE. 

(e) if the final sign is negative than the surface is convex, otherwise it is concave. 

The first example is a synthetic sphere. The motion of the sphere was a translation of 
(2, -2, 10), rotation of 15° around the X-axis, rotation of -20° around the Y-axis, and rotation 
of 5° around the Z-axis. The center of the sphere was initially located at (0,0,50), with radius 
20. It had moved 2.7° of arc. The zero-crossing of T(r), i.e. the translation component of the 
motion, is shown in figure 8 at arbitrary three points on the sphere. The three zero crossings 
intersect at the FOE. Figure 8 also illustrates the resulting classification: all the points on 
the sphere have been correctly classified as convex which is shown by the particular grey level 
assigned to all of them. 

The second example is a synthetic torus. The motion of the torus was the same as that 
of the sphere. The center of the torus was initially located at (0,0,50), with large radius 10 
and small radius 5. The zero-crossings of T(r) are shown in figure 8 at arbitrary four points 
on the torus, two elliptic points and two hyperbolic points. Figure 9 illustrates the resulting 
classification: the torus had been correctly classified as being composed of a convex region on 
the outside and a hyperbolic region in the inside. The two classes are marked by different grey 
levels. Note the emergence of the parabolic line on the torus (the line separating the hyperbolic 
region from the convex region, whose type is parabolic). It is often argued that these parabolic 
lines are important for image representation (see [17]). 

3.3 Surface curvature from stereo disparities 

With general motion we had to know the location of the focus of expansion to disambiguate 
completely the sign of T(r) at a single point. The least we had to do was to repeat the analysis 
in more than one point in order to locate the focus of expansion. This computation is useful by 
itself, since the location of the focus of expansion is important for other purposes like navigation. 
However, we can use the limited knowledge on the relative location of the two cameras that is 
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Figure 8: Classification of a sphere from two images taken in motion, see text. Left: the optical 
flow vectors do not intersect and do not reveal much about the motion. Right: the translational 
components of the motion field intersect at the focus of expansion. 




Figure 9: Classification of a torus from two images taken in motion, see text. The final clas- 
sification is shown by the shading: light grey for hyperbolic and dark grey for convex. Left: 
the optical flow vectors, right: the zero-crossings of T(r): the axes of zero curvature and the 
translation component of the motion. 
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available if the two images are obtained as a stereo pair. In this case it is possible to obtain 
at each point the coordinates up to a scaling factor (like in perspective projection) in a new 
coordinate system whose focus of expansion is fixed, it is the origin of the coordinate system in 
either of the cameras. Thus it will be sufficient to apply the sign operator at a single point to 
be able to classify it fully, namely, disambiguating the elliptic case to convex or concave. 

I make the following assumptions: given two cameras, assume that the principal rays inter- 
sect at a fixation point. Assume also that the plane that passes through both cameras and the 
fixation point includes the X-axes of both cameras. The following coordinate system will be 
used (see figure 10): let the fixation point be the origin, the plane through the origin and the 




RIGHT CAMERA 



Figure 10: Above, the 3D coordinate system defined by two cameras. Below, the image plane 
of the right camera. Point O is the projection in the image plane of the 3D point P. Its polar 
coordinates R and are shown. 

two cameras be the X — Z plane, and the line perpendicular to this plane through the origin be 
the Y-axis. On the X — Z plane, the principal rays of both cameras intersect at the origin and 
create an angle 2/x between them. Let the Z-axis be the angle- bisector of 2/j,, and the X-axis 
perpendicular to the Z-axis. 

Let P = z(x,y, 1) in the new coordinate system. Let (ify, $j) and (Rr, $ r ) be the projections 
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in polar coordinates of P on the left and right images respectively (see figure 10). Then the 
following holds (see [19]): 

cot r + cot 0; 2 sin fi 
x = tan u, • r- , y = — — „ — r- 

COt r - COt 0; COt r - COt 0( 

Now the "first" image in the previous section will be one of the two actual images and 
the "second" image will be the perspective projection in the coordinate system defined above. 
Thus the focus of expansion in the first image is the origin of the camera. The sign bisector 
at direction r , the orientation along which T(r) changes its sign regardless of the sign of the 
normal curvature, is the line connecting Oq to the origin. Therefore the sign of T(r) can be 
directly used to obtain the sign of the normal curvature. For convenience, I compute T(r) as if 
the perspective projection in the second coordinate system is on the X — Z plane, a modification 
that does not affect any of the underlying arguments. Thus, 



(cot 0} - cot 0°) - (cot d\ - cot 0°) (cot 2 - cot 0°) - (cot 2 - cot 0°) 
st{T > ~ (cot 01 - cot 0°) + (cot 01 - cot 0°) (cot 0? - cot 0°) + (cot 2 - cot 0°) 



(3) 



If Oq = (x ,y ), then r = arctan f^-. Thus if Oi = (xi,yi) is chosen so that arctan f^- < 
arctan f- < (arctan ^ + 180°) then from proposition 2 the sign of T st (r) gives the sign of 
the normal curvature unambiguously. The same algorithm can now be used to classify surface 
patches from stereo disparities. 

The classification algorithm used in the following examples is as in section 3.2, with the 
following difference: 

1. T si (r) is computed instead of T(r). 

2. the sign of T(r) at r = 90° needs to be reversed only if the signs of the x- and y-coordinates 
of P are opposite (here we use the fact that the effective FOE is located at the origin of 
the coordinate systems). 

3. using the origin as the FOE, the zero- crossings of T st (r) that correspond to the two zero- 
curvature axes in the hyperbolic case and the single zero curvature axis in the parabolic 
case are isolated, from which the maximum and minimum curvature axes are immediately 
obtained. In the elliptic case the axes of minimum curvature is estimated by the r that 
minimizes T s t(r), and the maximum curvature axis is the perpendicular axis. 

Figure 11 shows classification results for synthetic data of a torus, a cylinder, a cone, a 
hyperbola and a sphere. All the objects but the torus were centered at (20, 20, 50) (in the above 
coordinate system) with the other parameters set to 4. The torus was centered at (20,20,20), 
with big radius 8 and small radius 4. The convergence angle of the camera (2/i) was 30°. The 
distance between both cameras and the fixation point was 150 for the torus and 50 for the other 
objects. The shadings are explained in the legend of the figure. The results are accurate both 
for surface classification and the directions of the principal and the zero axes. 

3.4 The computation of a ID curvature 

We have computed the sign of the surface curvature at a point by computing the sign of the 
curvature of curves whose tangents span all directions in the tangent plane of the surface at the 
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Figure 11: First row, left: a sphere, middle: a torus, right: inside a torus. Second row, left: a 
hyperbola, middle: a tilted cylinder, right: a double cone. The shadings mean the following: 
surface classification: the lightest grey marks hyperbolic regions (internal rings in both toruses, 
the hyperbola), darker shade of grey marks parabolic regions (cylinder and cone), darker grey 
marks convex regions (sphere, external ring of torus), and the darkest grey marks concave 
regions (external ring of inner torus); 

axes: white marks axes of zero-curvature, grey marks axes of minimum curvature or maximal 
negative curvature, and black marks axes of maximal curvature. 
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point. Each of these curves was defined by three points on the surface and had the property 
that the projections of the three points in the first image were collinear. In this case the sign of 
the 2D curvature of the projections of the three points in the other image relative to the FOE 
gave the sign of the 3D curvature of the 3D curve on the surface. This is also the sign of the 
normal curvature at the direction of the tangent to this 3D curve. 

This scheme can be generalized to estimate the sign of the curvature (though not the normal 
curvature) of other 3D curves defined by three points in the two images. A generalized rule 
would be the following: let a be the 2D angle between three points in the first image (see 
figure 12). < a < 90° if the angle is turned towards the FOE in that image and 90° < a < 180° 
otherwise. Thus for backward motion, if a increases from the first image to the second, the sign 
of the curvature is positive, otherwise it is negative (figure 12a). This generalized rule yields 
the correct sign in many cases. Figure 12b illustrates the deterioration in performance when 
the angle a between the three points in the first image, which measures the deviation from 
collinearity, increases. 



FOE 



-► FOE 



first image 



second image 




10 »0 30 40 SO 

2D curvature (angle In degrees) 
b) 



Figure 12: a) an example of the change in the 2D curvature of three points originating from a 
concave curve in 3D from one image to the next. In the upper and middle examples < a < 90°, 
in the lower example 90° < a < 180°. b) The generalized rule (see text) is not exact, its 
performance deteriorates with the amount of deviation from collinearity in the first image (a 
in the text). In this example the motion is a translation of (10,0, 10), rotation of 10° around 
the X-axis, rotation of —10° around the Y-axis, and rotation of 10° around the Z-axis. 



4 Sensitivity to errors 

Small errors in the data due to quantization errors in discrete data and noise have quite devas- 
tating effects on the estimation of local surface type. This is true for any algorithm, therefore 
the data (either disparities or reconstructed depth) has to be substantially smoothed before the 
surface type can be meaningfully computed. To estimate the error rate before smoothing, I 
compute the percent of correct evaluation of the sign of the normal curvature at all directions 
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over all the surface (that is, at the same data points that were used for the previous classification 
examples). 

The error rate is first computed for the simple 2D algorithm described in section 3. It is 
compared to the error rate of the best alternative algorithm (both before smoothing). This 
algorithm estimates the 3D coordinates of a matched pair by the point closest to two 3D rays, 
each passes through one camera and the projection of the feature on its image. (These rays 
ideally intersect at the exact location of the feature in 3D). The algorithm uses a perfect knowl- 
edge of the motion or camera parameters, therefore the usually large errors introduced while 
computing these parameters from the noisy data itself are artificially avoided. As expected, 
when the recursive error due to the computation of the motion parameters from noisy data is 
eliminated, the best exact algorithm does better than the 2D algorithm, but not much better. 
The results of the comparison are given in table 1 for stereo and table 2 for motion. Data is 
given for different objects, different resolution levels (measured in the number of pixels in the 
intervals || 0\ - O || or || 2 - O ||), and different noise levels (where the standard deviation 
is measured in percent of the intervals || 0\ — Oq || or || Oi — Oq ||). 







noise 




error rate 
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SD 


2D algorithm 


best 3D algorithm 


difference 


cylinder 
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30% 


5% 
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26% 


6% 


hyperbola 


10 


— 


15% 


9% 


6% 


hyperbola 


5 


— 


26% 


17% 


9% 


torus 


10 


— 


23% 


19% 


4% 


torus 


50 


— 


6% 


3% 


3% 


torus 


— 


10% 


41% 


32% 


9% 


torus 


— 


4% 


26% 


18% 


8% 



Table 1: Curvature from stereo: the first column gives the object type, the second column 
gives the resolution (see text) if it is finite, and the third column gives the standard-deviation 
of the noise in percents (see text) if there is any. The next two columns give the error rate 
for the 2D algorithm described in the previous section and the best 3D algorithm using exact 
motion parameters (see text). The last column gives the difference in error rates between the 
two algorithms. 

In the 2D curvature from motion algorithm, small angles of curvature may be classified as 
zero-curvature when the resolution is finite. Such directions are ignored in the computation of 
the error rate. For finite resolution I compute the error rate in two cases: first subcolumn in 
table 2 is the regular error rate as before; second subcolumn in figure 2 is the error rate if the 
task is performed with hyperacuity that is an order of magnitude better than visual acuity. If a 
biological visual system uses its ability to compute the orientation of three points with an order 
of magnitude higher precision than visual acuity (Vernier acuity), then the second subcolumn 
may give a better comparison for its error rate (see section 5). 
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8% 


46% 


— 


42% 
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Table 2: Curvature from general motion: translation (10, —10, 10) and rotation 15° around the 
X-axis, —20° around the F-axis, and 5° around the Z-axis. The columns are as in table 1, 
with a difference that two error rates are given for the 2D approximate algorithm in the finite 
resolution cases (see text). 

5 Discussion 

The curvature operators described in section 3 can be implemented by a biological system 
with high precision. From Proposition 1 we see that the operator that gives the sign of the 
normal curvature has to check whether three points are collinear or otherwise how the angle 
between them is oriented. This is an example of a hyperacuity task (see [20] pp. 337 for a 
review), namely, the precision with which it can be done is ten times higher than the visual 
acuity. Thus, the biological system may be capable of computing the sign of the curvature 
directly, without recourse to an operator similar to T. Because of the hyperacuity resolution, 
the expected error rate, which is already of the same order of magnitude as the error rate of 
the best 3D algorithm that uses known motion parameters, should be significantly lower (see 
table 2). Also, the algorithm that computes shape type involves only line operators at different 
orientations. This is consistent with known biological architectures. 

Koenderink and van Doom ([14]) showed that some important features, the sign of the 
Gaussian curvature for example, are related to motion invariants of vector fields (e.g., shear). 
These results are derived using vector field analysis and therefore assume the existence of a 
differentiable vector field (though singularities are addressed in [21]). The results are less gen- 
eral in that the curvature is assumed to be large relative to the distance to the object, and 
the angular part of the rotation is assumed small. It is also not clear how the appropriate 
vector field invariants can be computed. Finally, the sign of the Gaussian curvature does not 
provide a complete classification of surfaces with respect to the viewer (i.e., the distinction 
convex/concave). I have shown above that some interesting quantities (the sign of the Gaus- 
sian curvature and the absolute sign of the normal curvature) can be computed with simple 
hyperacuity detectors at different orientations. The analysis is exact, the only approximation 
is in the computation of the curvature of a planar curve using discrete data. (It is interesting 
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to note here that Koenderink and van Doom [16] have suggested the use of difference of slopes 
of line segments to approximate the shear of the stereo vector field. This is in fact the operator 
used above (equation 2) to determine the sign of the normal curvature.) 

One can also regard the 2D algorithm of section 3 as a way to compute the direction of 
motion: the focus of expansion and the direction of the translational component of the mo- 
tion. The location of the FOE is obviously important for navigation, and (the exact value of) 
the translational component of the optical flow can give relative depth. Longuet-Higgins and 
Prazdny ([7]) have shown that these quantities can be computed from the optical flow and 
described two algorithms to compute them. Their algorithm (the one using dense data) com- 
putes the exact value of the translational component of the optical flow, not only its direction. 
Some of its drawbacks are the following: it is computationally expensive and noise-sensitive; 
it assumes that the surface function is smooth enough so that it can be approximated by the 
linear terms of X and Y; and it is biologically implausible. Altogether, it is given more as an 
existence proof that the computation of the motion parameters and structure from motion are 
possible from images only. The approximate algorithm of section 3 shows that if we do not 
require a complete computation of the motion parameters then some important features of the 
motion can be computed more easily and in parallel, more reliably, and by a more biologically 
plausible algorithm. It can also be used before a more exact algorithm to obtain an initial 
estimate of the location of the FOE and the translational component of the motion. 

6 Summary 

This work has been motivated by two observations. First, the computation of the motion 
parameters or the cameras' calibration is generally complicated, time consuming and error 
sensitive. Second, it is not clear that biological vision needs such a computation or that it uses 
the exact recovery of the depth of a surface at each point. From the analysis presented above 
we can conclude that the direct computation of some interesting motion and shape invariants 
from matched images may be computationally easier, more parallel in nature, and more robust 
in the presence of errors. More specifically, it has been shown that the sign of the Gaussian 
curvature of a surface patch can be obtained from motion or stereo disparities with a simple, 
biologically plausible, operator. The focus of expansion can also be obtained from this analysis. 
The surface can further be classified as convex, concave, planar, cylindrical, or saddle-point. If 
a sufficient amount of interesting quantities can be computed in a similar way (which depends 
of course on the goal of the computation), the exact motion parameters and shape need not 
be computed at all. This may be the case for the limited purposes of biological vision like 
recognition and navigation. 

7 Appendix 

Following are the proofs of the propositions in section 3. 

Proposition 1 Let Pq denote a point on the surface of some object whose projection in the 
first image is Oq. Let P\ and Pi denote two other points on the same surface whose projections 
in the first image are 0\ and O2, and where Oo, 0\ and 2 are collinear. Let Oo, 0\, and 2 
be the projections of the same three points in a second image. Assume the motion is backward 
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(away from the focus of expansion). Then the sign of the normal curvature of the curve C 
passing through Pq, P\, and Pi can be determined as follows: 

• if the smaller angle through Oq, 0\ and O2 is turned towards the focus of expansion then 
the normal curvature of £ is positive (see figure 5a). 

• if Oq, 0\ and O2 are collinear then the normal curvature of £ is (see figure 5b). 

• if the smaller angle through Oo, 0\ and O2 is turned away from the focus of expansion 
then the normal curvature of £ is negative (see figure 5c). 

In forward motion (towards the focus of expansion), the interpretation of the angle is reversed. 
(The motion of the coordinate system is defined to be a rotation followed by a translation.) 

Proof: 

Let wq denote the tangent of the curve £ whose sign of curvature we want to estimate. The 
tangents to all the curves on the surface passing through Pq must he in the tangent plane at Po 
at some angle 6 (e.g., wg in figure 13). The normal curvature of any curve with tangent w$ is 
equal to the exact curvature of a single curve with tangent wg. This curve is the intersection of 
the normal section, the plane through wg and the normal, with the surface (see figure 13). Let 
ug be the intersection of the normal plane and the surface. It is therefore sufficient to compute 
the curvature of ug to obtain the normal curvature of £. Let ng denote the curvature of u$. 

Consider the lower part of figure 13. Let N be some arbitrary axis through Po that creates a 
sharp angle with N (that is, N -No > 0). We define an iV -section in a similar way to the normal 
section: it is the plane that passes through N and the tangent line wg. The corresponding 
JV -section intersects the surface at a curve u° 9 . Let n° & be the curvature of u° e , n° e lies in the 
iV -section. Since n° e is perpendicular to wg, it lies along the projection of N on the iV -section, 
either in the direction of N or —N. Since the angle between N and N is sharp, so is the angle 
between N Q and the projection of N on the JV -section. Thus the sign of n° 6 with respect to N 
(the sign of n° e • N ) is equal to its sign with respect to the projection of N on the 7V -section. 
This, in turn, has the same sign as its sign with respect to N (the sign of n°g ■ N), which is the 
sign of the normal curvature. Therefore the sign of n° e with respect to N (the sign of n° e ■ N Q ) 
is equal to the sign of the normal curvature corresponding to wg (the sign of fig ■ TV). 

The argument reverses when applied to an axis N Q that creates an obtuse angle with N 
(that is, N ■ N < 0). It will break down if N and N are perpendicular (N ■ N = 0), a case 
for which the proposition does not hold. 

The first image is depicted in figure 14. We choose axis N to be the line of sight, the line 
connecting Po and the first camera. By definition the normal creates a sharp angle with the 
line of sight unless it is a boundary where the two lines are perpendicular. For a given wg, the 
corresponding iV -section (marked in figure 14 with continuous lines) includes Po, Pi and P2 
(three points on the surface as we have defined before), Oq, 0\ and O2 (their projections on 
the first image), and the camera's pinhole. The curve u° e is the line passing through Po, Pi and 
P2. We define n° e to be the angle bisector of v, the angle defined by Pi, P and P 2 . 1 (Thus 



This definition can be justified in the following way. The direction of the normal to a plane curve at some 
point Pq is the radius of the circle of curvature, which is the limit of a circle through Po and two neighboring 
points Pi and P2 as they approach Po. If some fixed Pi and P2 are equidistant to Po, the radius of the circle 



20 



i m,.mJim,mwJ mwmw.u. ^usiii i d*m^ i m mmmmmm^m^m 




asitN, 



Figure 13: Illustration of the normal section (upper part) awl the J^HMctioa (lower part), see 
text. 
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N -section 




first camera 



second camera 



Figure 14: The iV -section of a plane containing Pi, Po, Pi and the first camera. 

we can think of u° e as a smooth curve passing through Po, P\ and P2 whose tangent at Po is 
the line perpendicular to the angle bisector of u.) From the above discussion the sign of the 
normal curvature is determined by whether v is turned "towards" the camera or "away" from 
it. Let Pq be the intersection of the line of sight and the line through P\ and Pi in the plane 
of the 7V -section (see figure 14). Then the question is whether Pq is between the camera and 
Pq or on the other side of Pq. 

Perspective projection of the iV -section, specifically Po, Pi, P2, Pq and the camera's pinhole, 
preserves order if all points lie in the half space that is in the field of view of the projection 
(or the other half space). Assume that the plane is not projected to a line, that is, the second 
camera is not translating on the iV -section, for which case the analysis does not hold. Thus 
the question is whether the projection of Pq is between the projections of the camera's pinhole 
and Po or on the other side of the projection of Po- We choose the perspective projection on 
the second image, where the P,'s are projected to O s 's respectively, and the camera is projected 
to the focus of expansion. Thus if Oq is between the FOE and Oo then the normal curvature is 
positive, and if Oq is on the other side of Oo then the curvature is negative. If Oq = Oo then 
Po, Pi and P2 are collinear and the normal curvature is 0. This completes the proof for the 
backward motion since then Po, Pi, Pi and the camera are all in the field of view of the second 
camera. If the motion is forward then the first camera is not in the field of view of the second 
camera. The axis iV (the line of sight) is projected discontinuously and therefore the meaning 



passing through Pi, Po and P2 is also the angle bisector of the angle between them v. Thus the angle bisector 
serves as a discrete estimator for the direction of the normal given two points like difference operators serve to 
approximate derivatives. 
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of the angle through the projections of P l5 P and P 2 reverses. 

Proposition 2 Let Oi as before where 0\ and 2 ore chosen on different sides ofOo- Assume 
backward motion (the motion is defined now as a translation followed by a rotation). Let 
Oi = (xi,yi) denote the projections of Pi respectively in a second image, as before. Let T = 
%lZl° - l\Zl - tfOi is chosen such that the angle through 0\, Oo and the FOE going clockwise 
is smaller than 180°, that is, 0\ is below the sign-bisector in figure 15a, then the sign of T 
equals the sign of the normal curvature of (,. If 0\ is chosen so that the angle is larger than 
180° then the sign oft is opposite to the sign of the normal curvature of Q. If the angle equals 
180° then the sign o/T is identically 0. 

Proof: 

From the previous proposition, the sign of the normal curvature is determined by whether Og 
is between the FOE (the projection of the camera) and O or on the other side of O (see 
figure 15b). 
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Figure 15: The perspective projection of JV -section assuming P t , P , P 2 and the camera are 
on the same field of view: a) first image, b) second image. 

From its definition T = tan/3 2 - tan ft (figure 15b). We know that 6~i and 2 lie on 
different sides of the sign-bisector. Assume for simplicity that 6\ and 2 lie on different sides 
of a parallel to the Y-axis throughOo (figure 15b). If Oi is below the sign-bisector in figure 15b 
then the sign of T is positive iff Og is between the FOE and O and negative iff Og is on the 
other side of O . That is, the sign of T is equal to the sign of the normal curvature of £ if the 
angle through 0\, O and the FOE going clockwise is smaller than 180°. We have used the 
previous proposition for backward motion when the motion is defined as rotation followed by 
translation. If the motion is redefined as translation followed by rotation, and backward motion 
is again assumed, then this condition is equivalent to the following: the sign of T is equal to 
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the sign of the normal curvature of C if the angle through 0\, Oq and the FOE going clockwise 
is smaller than 180°. In a similar way, the sign of T is the opposite of the sign of the normal 
curvature of £ if the angle through 0\, Q and the FOE going clockwise is larger than 180°. If 
the angle through 0\, O and the FOE equals 180°, P , A, P2 and the camera are collinear 
and therefore T = 0. This completes the proof of the proposition. 

When 0\ and O2 are both on the same side of a parallel to the Y-axis through Do the 
problem can be easily fixed. This case is detected when the sign of x 2 - x equals the sign of 
#i - x . It is sufficient to push either 0\ or 2 to be almost parallel to the F-axis on the other 
side (±00). Usually, though, the combined use of T and T -1 eliminates the problem. 
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